60 research outputs found

    Combinatorial algorithm for counting small induced graphs and orbits

    Full text link
    Graphlet analysis is an approach to network analysis that is particularly popular in bioinformatics. We show how to set up a system of linear equations that relate the orbit counts and can be used in an algorithm that is significantly faster than the existing approaches based on direct enumeration of graphlets. The algorithm requires existence of a vertex with certain properties; we show that such vertex exists for graphlets of arbitrary size, except for complete graphs and C4C_4, which are treated separately. Empirical analysis of running time agrees with the theoretical results

    Computation of Graphlet Orbits for Nodes and Edges in Sparse Graphs

    Get PDF
    Graphlet analysis is a useful tool for describing local network topology around individual nodes or edges. A node or an edge can be described by a vector containing the counts of different kinds of graphlets (small induced subgraphs) in which it appears, or the "roles" (orbits) it has within these graphlets. We implemented an R package with functions for fast computation of such counts on sparse graphs. Instead of enumerating all induced graphlets, our algorithm is based on the derived relations between the counts, which decreases the time complexity by an order of magnitude in comparison with past approaches

    Attribute Interactions in Medical Data Analysis

    Get PDF
    There is much empirical evidence about the success of naive Bayesian classification (NBC) in medical applications of attribute-based machine learning. NBC assumes conditional independence between attributes. In classification, such classifiers sum up the pieces of class-related evidence from individual attributes, independently of other attributes. The performance, however, deteriorates significantly when the “interactions” between attributes become critical. We propose an approach to handling attribute interactions within the framework of “voting” classifiers, such as NBC. We propose an operational test for detecting interactions in learning data and a procedure that takes the detected interactions into account while learning. This approach induces a structuring of the domain of attributes, it may lead to improved classifier’s performance and may provide useful novel information for the domain expert when interpreting the results of learning. We report on its application in data analysis and model construction for the prediction of clinical outcome in hip arthroplasty

    FragViz: visualization of fragmented networks

    Get PDF
    BACKGROUND Researchers in systems biology use network visualization to summarize the results of their analysis. Such networks often include unconnected components, which popular network alignment algorithms place arbitrarily with respect to the rest of the network. This can lead to misinterpretations due to the proximity of otherwise unrelated elements. RESULTS We propose a new network layout optimization technique called FragViz which can incorporate additional information on relations between unconnected network components. It uses a two-step approach by first arranging the nodes within each of the components and then placing the components so that their proximity in the network corresponds to their relatedness. In the experimental study with the leukemia gene networks we demonstrate that FragViz can obtain network layouts which are more interpretable and hold additional information that could not be exposed using classical network layout optimization algorithms. CONCLUSIONS Network visualization relies on computational techniques for proper placement of objects under consideration. These algorithms need to be fast so that they can be incorporated in responsive interfaces required by the explorative data analysis environments. Our layout optimization technique FragViz meets these requirements and specifically addresses the visualization of fragmented networks, for which standard algorithms do not consider similarities between unconnected components. The experiments confirmed the claims on speed and accuracy of the proposed solution

    Improving generalisation of AutoML systems with dynamic fitness evaluations

    Full text link
    A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often \textit{k}-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated \textit{k}-fold cross-validation, at little extra cost over single \textit{k}-fold, and far lower cost than typical repeated \textit{k}-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single \textit{k}-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.Comment: 19 pages, 4 figure

    Interactive Network Exploration with Orange

    Get PDF
    Network analysis is one of the most widely used techniques in many areas of modern science. Most existing tools for that purpose are limited to drawing networks and computing their basic general characteristics. The user is not able to interactively and graphically manipulate the networks, select and explore subgraphs using other statistical and data mining techniques, add and plot various other data within the graph, and so on. In this paper we present a tool that addresses these challenges, an add-on for exploration of networks within the general component-based environment Orange

    GenePath: a System for Automated Construction of Genetic Networks from Mutant Data

    Get PDF
    Motivation: Genetic pathways are often used in the analysis of biological phenomena. In classical genetics, they are constructed manually from experimental data on mutants. The field lacks formalism to guide such analysis, and accounting for all the data becomes complicated when large amounts of data are considered. Results: We have developed GenePath, an intelligent assistant that mimics expert geneticists in the analysis of genetic data. GenePath employs expert-defined patterns to uncover gene relations from the data, and uses these relations as constraints that guide the search for a plausible genetic network. GenePath provides formalism to genetic data analysis, facilitates the consideration of all the available data in a consistent and systematic manner, and aids in the examination of the large number of possible consequences of a planned experiment. It also provides an explanation mechanism that traces back every finding to the pertinent data. GenePath was successfully tested on several genetic problems. Availability: GenePath can be accessed at http://genepath.org. Supplementary information: Supplementary material is available at http://genepath.org/bi-supp

    Web-enabled knowledge-based analysis of genetic data

    Get PDF
    We present a web-based implementation of GenePath, an intelligent assistant tool for data analysis in functional genomics. GenePath considers mutant data and uses expert-defined patterns to find gene-to-gene or gene-to-outcome relations. It presents the results of analysis as genetic networks, wherein a set of genes has various influence on one another and on a biological outcome. In the paper, we particularly focus on its web-based interface and explanation mechanisms
    corecore